Figure 1. The experimental layout of the 5-day chemical exposures in petri dishes prior to the behavioural assay
Figure 2. The behavioural assay plate layout for an individual experiment. In total, 54 5-day old embryos, 9 for each dose, are transferred to a 96-well plate in exposure media
Figure 3. A visual representation of the behavioural assay protocol
Figure 4. An example of what the raw swim-path tracing looks like from the Viewpoint Zebrabox infrared camera and Viewpoint Zebralab software
The raw data contain many variables that we will explore once we import the data. You can browse the meta data in the next section of this report.
.XLS files have been converted to .csv files and are included in the directory /home/joryc/Downloads/GNU Zip Files/Collaborator_Data/Data. These are the raw data files that this EDA will be using.
| Animal | Chemical | Dose |
|---|---|---|
| Animal01 | DMSO | 0 |
| Animal02 | DMSO | 0 |
| Animal03 | DMSO | 0 |
| Animal04 | DMSO | 0 |
| Animal05 | DMSO | 0 |
| Animal06 | DMSO | 0 |
| Animal07 | DMSO | 0 |
| Animal08 | DMSO | 0 |
| Animal09 | DMSO | 0 |
| Animal10 | DMSO | 0 |
| Animal11 | DMSO | 0 |
| Animal12 | DMSO | 0 |
| Animal13 | PFOS | 0.1 |
| Animal14 | PFOS | 0.1 |
| Animal15 | PFOS | 0.1 |
| Animal16 | PFOS | 0.1 |
| Animal17 | PFOS | 0.1 |
| Animal18 | PFOS | 0.1 |
| Animal19 | PFOS | 0.1 |
| Animal20 | PFOS | 0.1 |
| Animal21 | PFOS | 0.1 |
| Animal22 | PFOS | 0.1 |
| Animal23 | PFOS | 0.1 |
| Animal24 | PFOS | 0.1 |
| Animal25 | PFOS | 1 |
| Animal26 | PFOS | 1 |
| Animal27 | PFOS | 1 |
| Animal28 | PFOS | 1 |
| Animal29 | PFOS | 1 |
| Animal30 | PFOS | 1 |
| Animal31 | PFOS | 1 |
| Animal32 | PFOS | 1 |
| Animal33 | PFOS | 1 |
| Animal34 | PFOS | 1 |
| Animal35 | PFOS | 1 |
| Animal36 | PFOS | 1 |
| Animal37 | OBS | 0.1 |
| Animal38 | OBS | 0.1 |
| Animal39 | OBS | 0.1 |
| Animal40 | OBS | 0.1 |
| Animal41 | OBS | 0.1 |
| Animal42 | OBS | 0.1 |
| Animal43 | OBS | 0.1 |
| Animal44 | OBS | 0.1 |
| Animal45 | OBS | 0.1 |
| Animal46 | OBS | 0.1 |
| Animal47 | OBS | 0.1 |
| Animal48 | OBS | 0.1 |
| Animal49 | OBS | 1 |
| Animal50 | OBS | 1 |
| Animal51 | OBS | 1 |
| Animal52 | OBS | 1 |
| Animal53 | OBS | 1 |
| Animal54 | OBS | 1 |
| Animal55 | OBS | 1 |
| Animal56 | OBS | 1 |
| Animal57 | OBS | 1 |
| Animal58 | OBS | 1 |
| Animal59 | OBS | 1 |
| Animal60 | OBS | 1 |
| Animal61 | F53B | 0.1 |
| Animal62 | F53B | 0.1 |
| Animal63 | F53B | 0.1 |
| Animal64 | F53B | 0.1 |
| Animal65 | F53B | 0.1 |
| Animal66 | F53B | 0.1 |
| Animal67 | F53B | 0.1 |
| Animal68 | F53B | 0.1 |
| Animal69 | F53B | 0.1 |
| Animal70 | F53B | 0.1 |
| Animal71 | F53B | 0.1 |
| Animal72 | F53B | 0.1 |
| Animal73 | F53B | 1 |
| Animal74 | F53B | 1 |
| Animal75 | F53B | 1 |
| Animal76 | F53B | 1 |
| Animal77 | F53B | 1 |
| Animal78 | F53B | 1 |
| Animal79 | F53B | 1 |
| Animal80 | F53B | 1 |
| Animal81 | F53B | 1 |
| Animal82 | F53B | 1 |
| Animal83 | F53B | 1 |
| Animal84 | F53B | 1 |
| Animal85 | NA | NA |
| Animal86 | NA | NA |
| Animal87 | NA | NA |
| Animal88 | NA | NA |
| Animal89 | NA | NA |
| Animal90 | NA | NA |
| Animal91 | NA | NA |
| Animal92 | NA | NA |
| Animal93 | NA | NA |
| Animal94 | NA | NA |
| Animal95 | NA | NA |
| Animal96 | NA | NA |
Glimpse the raw data to see the structure of each variable, the
number of observations and the class of the raw_data
object
## Rows: 9,790
## Columns: 16
## $ animal <chr> "Animal01", "Animal01", "Animal02", "Animal02", "Animal03", …
## $ Treatment <chr> "DMSO", "DMSO", "DMSO", "DMSO", "DMSO", "DMSO", "DMSO", "DMS…
## $ an <lgl> FALSE, TRUE, FALSE, TRUE, FALSE, TRUE, FALSE, TRUE, FALSE, T…
## $ start <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ end <dbl> 60, 60, 60, 60, 60, 60, 60, 60, 60, 60, 60, 60, 60, 60, 60, …
## $ inact <int> 54, 54, 65, 65, 50, 50, 60, 60, 103, 103, 82, 82, 14, 14, 69…
## $ inadur <dbl> 18.5, 18.5, 7.7, 7.7, 7.5, 7.5, 10.2, 10.2, 24.1, 24.1, 16.2…
## $ inadist <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ smlct <int> 175, 175, 291, 291, 251, 251, 246, 246, 196, 196, 104, 104, …
## $ smldur <dbl> 29.5, 29.5, 25.9, 25.9, 18.2, 18.2, 26.5, 26.5, 29.6, 29.6, …
## $ smldist <dbl> 606.0, 606.0, 698.8, 698.8, 423.4, 423.4, 575.4, 575.4, 472.…
## $ larct <int> 139, 139, 288, 288, 262, 262, 237, 237, 160, 160, 25, 25, 38…
## $ lardur <dbl> 6.7, 6.7, 19.8, 19.8, 16.4, 16.4, 17.5, 17.5, 6.3, 6.3, 1.2,…
## $ lardist <dbl> 365.0, 365.0, 1139.7, 1139.7, 1119.2, 1119.2, 889.3, 889.3, …
## $ emptyct <int> 58, 0, 118, 0, 195, 0, 61, 0, 1, 0, 1, 0, 15, 0, 2, 0, 301, …
## $ emptydur <dbl> 5.2, 0.0, 6.6, 0.0, 17.9, 0.0, 5.8, 0.0, 0.0, 0.0, 0.1, 0.0,…
## [1] "spec_tbl_df" "tbl_df" "tbl" "data.frame"
From the glimpse it can be seen that there are 16 variables in the tibble/data frame:
animal represents individual animals in the
experimentTreatment The chemical and dose groupan Unknown/not usefulstart start time of observation in secondsend end time of observation in secondsinact Inactivity Counts | the number of times the fish
went from being active to inactive over the observation timeinadur Inactivity Duration | the duration of time, in
seconds, the fish went from being active to inactive over the
observation time (1 minute)inadist Inactivity Distance | the distance travelled by
inactive observations (this value should be 0)smlct Small Activity Counts | the number of times the
fish had a small burst of swim activity over the observation time (1
minute)smldur Small Activity Duration | the duration of the
small burst of swim activity over the observation time (1 minute)smldist Small Activity Distance | The distance
travelled during small bursts of activitylarct Large Activity Counts | the number of times the
fish had a large burst of swim activity over the observation time (1
minute)lardur Large Activity Duration | the duration of the
large burst of swim activity over the observation time (1 minute)lardist Large Activity Distance | The distance
travelled during large bursts of activityemptyct Counts that were neither inactive or active
(data recording artifact)emptydur duration of time fish was neither inactive, or
active (data recording artifact) | Almost acts like a confidence value.
The closer it is to 60, the more unreliable the data areraw_dataIt is expected that the raw data will have 4800 rows because there are 96 wells, and the assay is 50 minutes
nrow(raw_data)
## [1] 9790
However, there are 9790 rows present in the raw data. This expectation
was violated because each observation is duplicated and there are some
extra observations
By looking at just the head of raw_data, it can be seen
that the variable an has a TRUE and
FALSE row for each individual observation. The only
difference between these duplicate rows is that the FALSE
rows retain information about emptyct and
emptydur.
raw_data <- raw_data %>%
filter(an == FALSE) %>% # Removing duplicate rows
select(-c(an)) # This variable is not very useful anymore, so removing
nrow(raw_data)
## [1] 4895
After filtering for just the false values, there are now 4895 rows.Â
There are -134305 extra observations in raw_data because
there are some extra observations past 50 minutes.Â
raw_data <- raw_data %>%
filter(end <= 3000) # Deleting observations past 50 mins (3000 seconds)
nrow(raw_data)
## [1] 4800
identical(as.numeric(nrow(raw_data)), (96 * 50 * 1)) # is the expected number of rows consistent with the observed number of rows after processing?
## [1] TRUE
After ensuring there are no observations past the 50-minute mark, there are now 4800 rows, as expected.
It is expected that wells 85:96 will all be NA in the
Treatment column because these are all empty wells (Figure
2). This means that all NA treatments should be 600
observations long. After removing NAs there should be 4200
rows in raw_data.
raw_data <- raw_data %>%
filter(Treatment != is.na(Treatment))
nrow(raw_data)
## [1] 4200
identical(as.numeric(nrow(raw_data)), (84 * 50 * 1)) # Does the expected number of rows match the observed number of rows after filtering?
## [1] TRUE
Figure 5. Quick plots of the ‘counts’ variables in the
raw_data object
emptyct variable (plot 1) is a good tool to use for
flagging observations that need to transformed to NAs in
the next section (Suspicious Values).inact) is approximately ~60 times
per minute. As well, it can be seen that small swim bursts
(smlct) tend to occur just over 225 times per minute. And
finally, we can see that large swim bursts (larct) can
either occur just under 200 times per minute, or 0 times per minute.
This could be due to sensitive effects of light on swim inhibition, or
darkness stimulating large swim behaviours.
Figure 6. Quick plots of the ‘duration’ variables in the
raw_data object
Figure 6 reveals another red flag with the emptydur
variable (plot 1 in figure 6). There are some observations that show 60
full seconds of being empty! This is likely more than just an artifact
in the recording instrument/software. These are likely dead or immobile
fish that never moved at all so the infrared camera was never able to
start tracing their swim patterns (during that 60-second observation
period). However, it can also be seen that there are some observations
greater than 0 and less than 60 in this plot. In theory, if an animal is
present in the well, the emptydur value should always be
zero. An arbitrary threshold of 20 seconds of empty duration will be
used to transform all observations (across variables) with an
emptydur > 20 into NAs
Again, it can also be seen that the distributions of observations for
each variable is slightly skewed. Note also that inadur and
activedur (plots 2 and 3 from figure 6) are approximately
inversely related as expected. The emerging pattern of large swim
activity duration (lardur) clustering around two modes (0s
and ~15s) can be observed, similar to the counts variable.
Figure 7. Quick plots of the ‘distance’ variable in the
raw_data object
Figure 7 shows that the distributions of of the ‘distance’ variables
are slightly skewed with the totaldist variable being the
most normally distributed. Notably, totaldist seems to be a
promising effect endpoint to analyse since it is the most normally
distributed of all the other effect endpoints.
Figure 6 showed that the emptydur variable can be used
reliably to filter out observations with poor data quality. The ‘empty
duration’ variable ranges from 0s to 60s and indicates how long the
observation was not able to detect a fish in the well. An arbitrary
cutoff value of 20 seconds will be used to determine if an observation
was of poor-quality and therefore, should be converted to
NA. By doing this, the confidence in the accuracy of
observations can be increased across the entire data set.
NAObservations <- raw_data %>%
filter(poorQual == TRUE) %>% # Filter only rows with poor quality that have behavioural endpoint observations turned to NAs
nrow()
NAObservations
## [1] 466
466 60-second observations (rows) were transformed to
NAs across all of the behavioural endpoint observation
variables.
Overall, the animal recording set-up had an approximate failure-rate of
11 % – the percentage of time the infrared camera failed to detect an
animal when it was present in a well.
#### Dunnett_Results_Table